Computing the Sparse Matrix Vector Product using Block-Based Kernels Without Zero Padding on Processors with AVX-512 Instructions

نویسندگان

Bérenger Bramas

Pavel Kus

چکیده

The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. It has been shown that block-based kernels are helpful to achieve high performance, but also that they are difficult to use in practice because of the important zero padding they imply. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of the results from the previous executions. We compare the performance of our approach against the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel execution. Finally, we provide the corresponding code in an open source library, called SPC5.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing BLAKE with AVX, AVX2, and XOP

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit singleinstruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-a...

متن کامل

Vector ISA Extension for Sparse Matrix-Vector Multiplication

In this paper we introduce a vector ISA extension to facilitate sparse matrix manipulation on vector processors (VPs). First we introduce a new Block Based Compressed Storage (BBCS) format for sparse matrix representation and a Block-wise Sparse Matrix-Vector Multiplication approach. Additionally, we propose two vector instructions, Multiple Inner Product and Accumulate (MIPA) and LoaD Section ...

متن کامل

AVX Acceleration of DD Arithmetic Between a Sparse Matrix and Vector

High precision arithmetic can improve the convergence of Krylov subspace methods; however, it is very costly. One system of high precision arithmetic is double-double (DD) arithmetic, which uses more than 20 double precision operations for one DD operation. We accelerated DD arithmetic using AVX SIMD instructions. The performances of vector operations in 4 threads are 51-59% of peak performance...

متن کامل

GAMS Index for the NAG Parallel Library

C Elementary and special functions (search also class L5 ) C1 Integer-valued functions (e.g., factorial, binomial coefficient, permutations, combinations, floor, ceiling) C06GXFP Factorizes a positive integer n as n = n1 × n2. This routine may be used in conjunction with C06MCFP D Linear Algebra D1 Elementary vector and matrix operations D1a Elementary vector operations D1a1 Set to constant D1a...

متن کامل

An FMM Based on Dual Tree Traversal for Many-core Architectures

The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N -body codes demonstrates th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1801.01134 شماره

صفحات -

تاریخ انتشار 2018

Computing the Sparse Matrix Vector Product using Block-Based Kernels Without Zero Padding on Processors with AVX-512 Instructions

نویسندگان

چکیده

منابع مشابه

Implementing BLAKE with AVX, AVX2, and XOP

Vector ISA Extension for Sparse Matrix-Vector Multiplication

AVX Acceleration of DD Arithmetic Between a Sparse Matrix and Vector

GAMS Index for the NAG Parallel Library

An FMM Based on Dual Tree Traversal for Many-core Architectures

عنوان ژورنال:

اشتراک گذاری